IRIX Base Documentation 1998 November

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Base Documentation 1998 November / IRIX 6.5.2 Base Documentation November 1998.img / usr / relnotes / compiler_dev / ch3.z / ch3

Wrap

Text File | 1998-11-02 | 37KB | 924 lines

- 1 - Base Development 7.2.1 Release Notes - 2 - DDDDooooccccuuuummmmeeeennnntttt NNNNuuuummmmbbbbeeeerrrr 000000008888----1111777788882222----000033330000 3. _N_e_w__F_e_a_t_u_r_e_s__o_f__T_h_i_s__R_e_l_e_a_s_e The features in this chapter are new or significantly changed in the Base Compiler Development software since the MIPSpro 7.1 release. Other older features of note are also discussed. 3.1 _N_e_w__M_a_n__P_a_g_e_s__f_o_r__M_I_P_S_p_r_o__7_._2_._1 The opt(5), lno(5) and o32(5) man pages now provide information about their specific options. In the past, this information was bundled in the cc(1), CC(1), f77(1) and f90(1) man pages. Also included in MIPSpro 7.2.1 are the omp_lock(3), omp_nested(3) and omp_threads(3) man pages which are useful when doing development for multiprocessors. 3.2 _N_e_w__A_u_t_o_m_a_t_i_c__P_a_r_a_l_l_e_l_i_z_a_t_i_o_n__O_p_t_i_o_n The 7.2 release of the MIPSpro compilers marked a major revision of the auto-parallelizer. The new product incorporates automatic parallelization into the other optimizations performed by the MIPSpro compilers. Previous versions relied on preprocessors to provide source-to-source conversions prior to compilation. This change provides several benefits to developers: Automatic parallelization is integrated with optimizations for single processors A set of options and pragmas consistent with the rest of the MIPSpro compilers Better run-time and compile-time performance For more information, please refer to the auto_p(5) man pages. - 3 - NOTE: In order to run the new automatic parallelization, you must purchase the MIPSpro Auto Parallelization Option (SC4-APO-7.2) and install the license for it (FEATURE name string = auto_pp). 3.3 _C_o_m_p_i_l_e_r__S_y_s_t_e_m__C_h_a_n_g_e_s This section lists changes and additions to compilers and development tools since the MIPSpro 7.1 release. 3.3.1 _N_e_w__O_p_t_i_o_n_s__a_n_d__D_e_f_a_u_l_t_s__i_n__M_I_P_S_p_r_o__7_._2_._1 The following new options which control the inlining of memory intrinsics have been added to the -_O_P_T option group: - 4 - -OPT:.... mem_intrinsics[=(OFF|ON)] Enable inlining of memory intrinsics (memcpy, memmove, memset, bcopy, bzero, blkclr) in some cases. This option has an effect only if the corresponding procedure has a "#pragma intrinsic" for it. The standard include files contain this pragma for these routines (string.h, memory.h, bstring.h, strings.h). Note that the pragmas are disabled by default with the -ansi option. The option -D__INLINE_INTRINSICS can be used to enable intrinsics in the -ansi mode. (default OFF) memcpy_cannot_overlap[(OFF|ON)] The compiler assumes by default that the operands of the "memcpy" routine can overlap. This option allows the compiler to assume that the operands do not overlap and can thus generate better code. (default OFF) bcopy_cannot_overlap[(OFF|ON)] The compiler assumes by default that the operands of the "bcopy" routine can overlap. This option allows the compiler to assume that the operands do not overlap and can thus generate better code. (default OFF) memmove_cannot_overlap[(OFF|ON)] The compiler assumes by default that the operands of the "memmove" routine can overlap. This option allows the compiler to assume that the operands do not overlap and can thus generate better code. (default OFF) memmove_count=n Specify the maximum number of instructions that will be generated in the inline expansion for the memory intrinsics. (default 16) - 5 - 3.3.2 _-_O_P_T_:_I_E_E_E___c_o_m_p_a_r_i_s_o_n_s_=_noption For MIPSpro 7.2.1 the -OPT:IEEE_comparisons=n option has been renamed to -OPT:IEEE_NaN_inf=n whose definition under the -OPT option control group is as follows: IEEE_NaN_inf=n IEEE_NaN_inf=ON forces all operations which might have IEEE-754 NaN or infinity operands to yield results that conform to ANSI/IEEE 754-1985, the IEEE Standard for Binary Floating-point Arithmetic, which specifies standard for NaN and inf operands. Specify ON or OFF for setting. The default is IEEE_NaN_inf=OFF. IEEE_NaN_inf=OFF produces non-IEEE results for various operations. For example, x=x is treated as TRUE without executing a test, and x/x will be simplified to 1 without dividing. Turning this option on may suppress many such common optimizations and hurt performance as a result. For more information please consult the new man page opt(5). 3.3.3 _N_e_w__O_p_t_i_o_n_s__a_n_d__D_e_f_a_u_l_t_s__i_n__M_I_P_S_p_r_o__7_._2 A new -_D_E_B_U_G:_o_p_t_i_o_n control group has been created to allow insertion of code to assist in the debugging of programs. For example, -_D_E_B_U_G:_d_i_v__c_h_e_c_k=_N replaces -_T_E_N_V:_c_h_e_c_k__d_i_v=_N and the 7.2 compiler, by default, inserts code to check for divide by zero (N=1). _N_O_T_E: The default value for -_T_E_N_V:_c_h_e_c_k__d_i_v=_N under MIPSpro 7.1 was N=0 (no checks). For more information, please refer to the _c_c(1) and _D_E_B_U_G__g_r_o_u_p(5) man pages. The -_L_I_S_T: options control group has been enhanced to the create a listing file (.l) that contains the values of all flags modified, directly in the command line, or indirectly as a side effect of other options. For example: - 6 - % cc -n32 -LIST:options=ON foo.c will create foo.l which contains a listing that contains the default values of certain options from the -OPT, -LNO, -TARG and -TENV option control groups. The following command: % cc -n32 -LIST:all_options=ON foo.c will create foo.l which contains a listing that contains the default values of all options from all of the option control groups. For more information, please refer to the _c_c(1) man page. 3.3.4 _O_b_s_o_l_e_t_e__O_p_t_i_o_n_s Several compile-time flags have been obsoleted. These include: -_T_E_N_V:_m_i_s_a_l_i_g_n_e_m_n_t=_N, -_T_E_N_V:_a_l_i_g_n__e_x_t_e_r_n=_N and -_T_E_N_V:_a_l_i_g_n_e_d=_T_R_U_E. Their use will generate a warning message in both the compiler front-end and backend. For example: % cc -n32 -TENV:misalignment=3 reshape.c Warning: Obsolete option "-TENV:misalignment=3" -- ignored Warning: Obsolete option "-TENV:misalignment=3" -- ignored The -_T_E_N_V:_v_a_r_a_r_g_s__p_r_o_t_o_t_y_p_e_s=_T_R_U_E flag has been replaced by -_D_E_B_U_G:_v_a_r_a_r_g_s__p_r_o_t_o_t_y_p_e_s=_T_R_U_E. For more information, please refer to the _c_c(1) and _D_E_B_U_G__g_r_o_u_p(5) man pages. 3.3.5 _C_o_m_p_i_l_e_r__D_e_f_a_u_l_t_s When invoking the compiler, ----33332222 ----mmmmiiiippppssss2222 is assumed on all machines except those based on the R8000 processor. There, the compilations default to ----66664444 ----mmmmiiiippppssss4444. These defaults can, of course, be overridden at the command line or through the use of the _S_G_I__A_B_I environment variable. For more information on these flags, please refer to the _c_c(1), _f_7_7(1) and _a_b_i(5) man pages. The MIPSpro 7.1 compiler introduced a new method by which the user can customize the Application - 7 - Binary Interface (ABI), instruction set architecture (ISA) and processor type used in compilations where they are not explicitly specified. Under this method, the COMPILER_DEFAULTS_PATH environment variable can be set to a colon separated list of paths where the compiler will look for a _c_o_m_p_i_l_e_r._d_e_f_a_u_l_t_s file. If no _c_o_m_p_i_l_e_r._d_e_f_a_u_l_t_s file is found or if the environment variable is not set, the compiler looks for /_e_t_c/_c_o_m_p_i_l_e_r._d_e_f_a_u_l_t_s. If that file is not found either, the compiler resorts to the built-in defaults described in the _c_c(1)man pages and above. For a description of the specification format of this file, please refer to the _c_c(1)man pages. 3.3.6 _W_H_I_R_L _I_n_t_e_r_m_e_d_i_a_t_e _O_b_j_e_c_t _F_i_l_e _F_o_r_m_a_t _C_h_a_n_g_e_s The format of WHIRL Intermediate Object files has changed. If you have WHIRL intermediate (.o) files left over from compilations using MIPSpro 7.1 with interprocedural optimization enabled (i.e. -IPA), you must recompile the entire set of files. Whirl Intermediate Object files are compatible between MIPSpro 7.2.1 and MIPSpro 7.2. 3.3.7 _A_B_I__D_e_v_e_l_o_p_m_e_n_t For information about ABI development issues, see the man pages _a_b_i_c_c(_1), _a_b_i_l_d(_1), _c_h_e_c_k__a_b_i__c_o_m_p_l_i_a_n_c_e, _c_h_e_c_k__a_b_i__i_n_t_e_r_f_a_c_e and _c_h_e_c_k__f_o_r__s_y_s_c_a_l_l_s. 3.3.8 _C_o_n_t_r_o_l_l_i_n_g__CCCC_GGGG__c_o_m_p_i_l_e_r__o_p_t_i_m_i_z_a_t_i_o_n_s _C_G is the code-generation part of the compiler. There are choices to be made in many parts of _C_G, e.g. what conditional constructs should be if-converted, or how much should a loop be unrolled. In most cases the compiler should be making reasonable decisions. But there are still times when performance can be improved by modifying the default behavior. The following sections describe a few of the ways that _C_G can be controlled by the user. +o Non-loop if-conversion can be turned off with -_C_G:_i_f_c__n_o_n__l_o_o_p=_o_f_f. Currently non- loop if-conversion only applies to simple - 8 - if-then or if-then-else constructs with a very few conditionally executed instructions, so it should usually be advantageous to do the if-conversion. (In fact we may increase the amount of this kind of if-conversion we do in future releases). One reason the if-conversion could be sub-optimal is that one of the two paths through the code might be rarely executed. (This can be controlled with -_C_G:_b_o_d_y__f_r_e_q__f_b=_n. If some block in the loop has frequency less than 1/n times the frequency of the loop head, if-conversion for that loop is disabled.) +o If-conversion of innermost loops is disabled with -_C_G:_i_f__c_o_n_v_e_r_s_i_o_n=_o_f_f. The most likely reason that this would be useful is that if-conversion has increased the number of instructions in the loop by a lot, and the loop is Software Pipelined, so that there is no opportunity for reverse_if_conversion to undo the damage. Another way to protect against this possibility is by setting the value of -_C_G:_b_o_d_y__i_f_c__r_a_t_i_o. For example, if -_C_G:_b_o_d_y__i_f_c__r_a_t_i_o=_2, and the number of instructions in the loop grows by more than a factor of 2 due to if-conversion, then the if-conversion will be undone (and of course there will then be no opportunity to Software Pipeline that loop). +o Cross iteration optimizations can be disabled with -_C_G:_v_e_c_t_o_r__r_w__r_e_m_o_v_a_l=_o_f_f (for read-read or read-write optimizations), -_C_G:_v_e_c_t_o_r__w_w__r_e_m_o_v_a_l=_o_f_f (for write-write optimizations), and/or -_C_G:_c_r_o_s_s__i_t_e_r__c_s_e__r_e_m_o_v_a_l=_o_f_f (for common sub-expression elimination). The reason to do this is that these optimizations can increase register pressure. +o The unroll amount may be increased or decreased. There is a heuristic controlled by -_O_P_T:_u_n_r_o_l_l__a_n_a_l_y_s_i_s (on by default) which is generally trying to minimize unrolling, because less unrolling leads to smaller code size and faster compilation. Usually the only thing that makes it unroll too much is its attempt to minimize the - 9 - cost of penalties for taken branches. If you set the penalty for such a branch to 0 (-_C_G:_b_r_a_n_c_h__t_a_k_e_n__p_e_n_a_l_t_y=_0), or increase the cost for taken branches that the heuristic will tolerate (increase the value of -_C_G:_u_n_r_o_l_l__a_n_a_l_y_s_i_s__t_h_r_e_s_h_o_l_d from its default value of .1), you can probably avoid having loops unrolled too much. You can also change the upper bound for the amount of unrolling with -_O_P_T:_u_n_r_o_l_l__t_i_m_e_s (default is 8) or -_O_P_T:_u_n_r_o_l_l__s_i_z_e (the number of instructions in the unrolled body, current default is 80). In case the heuristic is limiting unrolling too much, it can be disabled with -_O_P_T:_u_n_r_o_l_l__a_n_a_l_y_s_i_s=_o_f_f. +o Software Pipelining can be disabled with -_O_P_T:_s_w_p=_o_f_f. As far as CG is concerned, -_O_3 -_O_P_T:_s_w_p=_o_f_f is the same as -_O_2. However, since LNO does not run at -_O_2, the input to CG can be very different, and the available aliasing information can be very different. +o Reverse_if_conversion for non SWP'd loops can be disabled with -_C_G:_r_e_v_e_r_s_e__i_f=_o_f_f. 3.4 _C_h_a_n_g_e_s__t_o__dddd_bbbb_xxxx_(_1_) +o _d_b_x has been enhanced to allow debugging of Fortran 90 allocatable arrays. +o _d_b_x has been enhanced to allow debugging of Fortran 90 assumed shape arrays. +o _d_b_x has been enhanced to allow debugging of C++ programs created with the -gslim option. This option limits the amount of debugging information generated by the C++ compiler for class definitions. You should consider using this option on large applications when you experience bloated object files, executables, or DSOs when compiling with -g. For more information refer to the CC(1) man pages. +o _d_b_x has been enhanced to allow debugging of C++ code that contains exception handlers. For more information, refer to the _D_B_X _U_s_e_r'_s _G_u_i_d_e. - 10 - +o _d_b_x has been enchanced to support pthreads debugging. For more information, refer to the _D_B_X _U_s_e_r'_s _G_u_i_d_e. 3.5 _C_h_a_n_g_e_s__t_o__t_h_e__l_i_n_k_e_r__llll_dddd_(_1_) This linker provides some new features and better performance. For more information refer to the _l_d(1) man page. +o As of release 5.0.1, the linker can adjust executables to avoid certain problems with early versions of the R4000. If the ----nnnnoooo____jjjjuuuummmmpppp____aaaatttt____eeeeoooopppp flag is on (it is on by default), small amounts of padding are added between component objects to avoid placing a branch instruction at the end of a page. Slightly smaller executables and significantly faster executables can result by turning this option off (using the ----aaaalllllllloooowwww____jjjjuuuummmmpppp____aaaatttt____eeeeoooopppp flag). Binaries built either way should be compatible across all Silicon Graphics systems, but those made with ----nnnnoooo____jjjjuuuummmmpppp____aaaatttt____eeeeoooopppp (the default) often show performance gains on R4000 systems. These flags are irrelevant for programs compiled with ----mmmmiiiippppssss4444 because the R8000 and R10000 processors do not have this hardware bug and no padding is performed by the linker. However, early versions of the R5000 may have problems if a jump or branch instruction occurs at an address 8 bytes before the end of an odd-numbered page, and if a load or store instruction immediately follows the jump or branch instruction. The 6.2 and above releases of the linker work around this problem by padding sections of object files that exhibit the characteristics described above. This occurs by default for object files compiled for ----mmmmiiiippppssss4444. Binaries can be built without this fix by using the ----aaaalllllllloooowwww____rrrr5555kkkk____jjjjuuuummmmpppp____aaaatttt____eeeeoooopppp option. +o The 6.2 release of the linker introduced an experimental new feature enabled by the ----mmmmuuuullllttttiiiiggggooootttt option. If you experience GOT Overflow problems in building your - 11 - applications, you should try relinking with the ----mmmmuuuullllttttiiiiggggooootttt option as an alternative to recompiling with ----xxxxggggooootttt. NOTE: The 7.2.1 release of the linker enables ----mmmmuuuullllttttiiiiggggooootttt by default. +o New options have been added to _l_d(1) for aligning variables in the global uninitialized data area (_b_s_s). See the manual page for _l_d(1) for options with names beginning with ----XXXX. These new options are unique to IRIX and might change across releases. 3.6 _A_s_s_e_m_b_l_e_r__(_aaaa_ssss_(_1_)_) +o As of 6.2, the assembler supports 64-bit instructions, and can generate 64-bit ELF object files. The COFF format is not supported. The 64-bit and N32 objects contain DWARF debugging support rather than MDEBUG. +o The calling conventions and register usage for 64-bit objects is different from the 32-bit conventions, so you should become familiar with the new conventions. The _M_I_P_S_p_r_o _6_4-_B_i_t _P_o_r_t_i_n_g _a_n_d _T_r_a_n_s_i_t_i_o_n _G_u_i_d_e is useful for porting code from 32 to 64 bits. Also see the standard include files <_r_e_g_d_e_f._h> and <_a_s_m._h> which are parameterized for 32-bit or 64-bit code. +o Most of the optimizations like software pipelining and cross-basic-block scheduling have been removed from the assembler; these optimizations are now done in the back-end, and thus only happen for high-level code. The assembler still does instruction scheduling for the user. +o There are three new assembler directives for the generation of 64-bit PIC (Position-Independent Code). These directives are ignored if not doing a 64- bit shared (PIC) compile. +o ._c_p_s_e_t_u_p _r_e_g, _r_e_g_2/_o_f_f_s_e_t, _l_a_b_e_l - 12 - By convention, reg == t9, and the label is the procedure entry. The second argument can be either another register (for the case of a leaf routine with no frame) or a stack offset, and is used to store the value of $gp. This directive expands into: ssssdddd ggggpppp,,,, ooooffffffffsssseeeetttt((((sssspppp)))) lllluuuuiiii ggggpppp,,,, %%%%hhhhiiii((((%%%%ggggpppp____rrrreeeellll((((llllaaaabbbbeeeellll)))))))) ddddaaaaddddddddiiiiuuuu ggggpppp,,,, ggggpppp,,,, %%%%lllloooo((((%%%%ggggpppp____rrrreeeellll((((llllaaaabbbbeeeellll)))))))) ddddaaaadddddddduuuu ggggpppp,,,, ggggpppp,,,, rrrreeeegggg +o ._c_p_r_e_t_u_r_n This directive expands into: lllldddd ggggpppp,,,, ooooffffffffsssseeeetttt((((sssspppp)))) where "offset" is the same value used in the previous .cpsetup. +o The .cpsetup/.cpreturn sequence replaces the .cpload/.cprestore sequence that is used in 32-bit PIC code. +o The other new directive is ._c_p_l_o_c_a_l _r_e_g_1 It specifies a register (typically not $gp) to be used as context pointer. It has effect only within a procedure (i.e., it is turned off automatically at the end of each procedure). There are two new directives in the 7.00 -n32/-64 assembler: +o ._d_y_n_s_y_m _n_a_m_e _v_a_l_u_e This specifies the st_other field of the symbol, which can be "sto_default", "sto_internal", "sto_hidden", or "sto_protected". +o ._g_p_v_a_l_u_e _v_a_l_u_e The gp value is used in %gp_rel relocations as an offset for the addend. By default the value is 0. - 13 - Chapter 8 of the _M_I_P_S_p_r_o _A_s_s_e_m_b_l_y _L_a_n_g_u_a_g_e _G_u_i_d_e contains descriptions of all of the directives supported by the assembler. +o The _M_I_P_S_p_r_o _N_3_2 _A_B_I _H_a_n_d_b_o_o_k and the _M_I_P_S_p_r_o _6_4-_b_i_t _P_o_r_t_i_n_g _a_n_d _T_r_a_n_s_i_t_i_o_n _G_u_i_d_e contain examples of how to write assembly language programs for the the N32 and 64- bit ABI's respectively. 3.7 _L_i_b_r_a_r_i_e_s The following changes to the libraries that are part of the compiler system were made in the 7.1 release. 3.7.1 _R_e_p_a_c_k_a_g_i_n_g__o_f__N_3_2__S_u_b_s_y_s_t_e_m_s The _c_o_m_p_i_l_e_r__d_e_v._s_w_3_2 subsystems have been bundled into the _c_o_m_p_i_l_e_r__d_e_v._s_w subsystems and are no longer present as independent subsystems. 3.7.2 _D_i_s_c_o_n_t_i_n_u_a_n_c_e _o_f _N_3_2 _a_n_d _6_4-_b_i_t _N_o_n- _s_h_a_r_e_d _L_i_b_r_a_r_i_e_s N32 and 64-bit versions of non-shared libraries for SPEC (_c_o_m_p_i_l_e_r__d_e_v._s_w_3_2._s_p_e_c_l_i_b and _c_o_m_p_i_l_e_r__d_e_v._s_w_6_4._s_p_e_c_l_i_b) are no longer being shipped. The following changes to the libraries that are part of the compiler system were made in the 6.2 release. +o The floating point exception handler package (libfpe) has been rewritten and released with support for programs compiled under ----mmmmiiiippppssss3333 or ----mmmmiiiippppssss4444. Refer to the _f_s_i_g_f_p_e(3f) and _h_a_n_d_l_e__s_i_g_f_p_e_s(3c) man pages. +o Fast floating point libraries (libfastm) tuned for the R5000, R8000 and R10000, respectively, are now available when doing compilation for the 64-bit and N32 ABI's. New ----rrrr5555000000000000, ----rrrr8888000000000000 and ----rrrr11110000000000000000 compiler flags are provided which add the paths of these libraries to the head of the library search path. For more information refer to the _c_c(1) and _f_7_7(1) man pages. - 14 - 3.8 _P_e_r_f_o_r_m_a_n_c_e__T_o_o_l_s This section includes changes to _p_i_x_i_e(1), _p_i_x_s_t_a_t_s(1), _p_r_o_f(1). +o As of the 7.0 release, _p_i_x_i_e(1), _p_i_x_s_t_a_t_s(1), and _p_r_o_f(1) are no longer supported. Their functionality has been integrated into a new product called SpeedShop. Interested users are referred to SpeedShop's release notes for more information. 3.9 _L_i_b_r_a_r_y__a_n_d__S_y_s_t_e_m__C_a_l_l__F_u_n_c_t_i_o_n_a_l_i_t_y The following additions and changes were made to library and system call functionality between versions 5.3 and 6.2 of the IRIS Development Option (now being replaced by the IRIX Development Foundation). +o The MIPSpro C compiler supports long double arithmetic using the ANSI C standard syntax. Most of the standard transcendental functions in _l_i_b_m and _l_i_b_c are supported. See specific man pages for names and prototypes. Most of the long double routines are named by prefixing the letter 'q' to the double precision routine's name; for example, _q_s_i_n is the long double version of _s_i_n. The following long double routines are NOT supported in this release: _a_c_o_s_h, _a_s_i_n_h, _a_t_a_n_h, _c_b_r_t, _d_r_a_n_d_4_8, _d_r_e_m, _e_r_a_n_d_4_8, _e_x_p_m_1. See the man page for _m_a_t_h(3M) for details regarding long double arithmetic. Note that long double operations on this system are only supported in "round to nearest rounding" mode (the default). The system must be in "round to nearest rounding" mode when issuing long double arithmetic operations or calling any of the long double functions, or incorrect answers will result.